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© Method and apparatus for monitoring a distribed system. 



© A method and apparatus for monitoring the be- 
havior over time of a distributed system (51). Time- 
stamped data descriptive of events at one sub- 
system (53) are placed (15) into a local buffer (59). 
The subsystem is notified (21) of the time when a 
trap condition occurs at another subsystem (63). 
Data having a time-stamp within a certain interval of 
the occurrence of the trap condition are archived 
(23) to provide a history of the system for later 
analysis. Time is determined by a local clock (57, 



67) in each subsystem. These clocks are synchro- 
nized (25) to ensure accurate correlation between 
events at different subsystems. Trap conditions are 
categorized (31) and data descriptive of subsystem 
states are classified (35) to facilitate selective no- 
tification (33) of one or more other subsystems, and 
selective retention (37) of the data, depending on 
which category of trap condition has occurred and 
which class of data has been collected. 



in 
oo 
in 



0L 
LU 



CENTRAL CONTROLLER 




S 



77 



FIRST SUBSYSTEM 

i ^ 73 




J 



57 



55 



xL 



x 



CONTROL 
ELEMENT 



53 



- 51 



SECOND SUBSYSTEM 
p69 73 

| buffer" * 




75 



| LOGIC W — H CI OCX | 
| SENSOR"] 



-63 



FIG 2 



BNSDOCID: <EP 0585479 A 1 _l_> 



Rank Xerox (UK) Business Services 

(3.10/3.09/3.3.41 



1 



EP 0 585 479 A1 



2 



BACKGROUND 

This invention relates generally to distributed 
systems and more particularly to a method and 
apparatus for monitoring the behavior of a plurality 
of interacting subsystems of a distributed system 
such as an electronic instrument and computer 
system by means of time-stamped observations. 

A distributed system typically contains a plural- 
ity of instruments, processors, subsidiary comput- 
ers, and other electronic measurement and control 
devices. Such devices arc collectively referred to 
herein as "subsystems". In many distributed sys- 
tems, especially in the area of real time measure- 
ment and control, the various subsystems must be 
synchronized with one another. 

The requirement that the subsystems be syn- 
chronized affects both the design and the debug- 
ging of the system The design of a distributed 
system typically includes overall system specifica- 
tion, hardware construction, and software develop- 
ment, whereas debugging refers to correcting de- 
viation of actual behavior from expected behavior. 
Debugging often occurs in a top-down fashion, 
which means that major state changes in a sub- 
system are monitored until an erroneous state or 
transition is observed, an attempt is made to repli- 
cate the error condition, and subsidiary states or 
transitions are monitored to determine causes and 
effects. This process is continued until the fault is 
isolated. 

Synchronizing the subsystems of a distributed 
system has usually been accomplished by control- 
ling all the subsystems from a central controller in 
a master-slave manner. In such a centrally-con- 
trolled system, the time behavior of each sub- 
system can be observed from the central control- 
ler. 

In a system where the subsystems must act in 
parallel or with relative autonomy, it is difficult to 
observe the time behavior of each subsystem in 
relation to the behavior of the other subsystems. A 
typical approach has been to use external instru- 
ments such as bus analyzers, oscilloscopes and 
logic analyzers to make the desired observations. 
Monitoring the flow of timing and control signals on 
a communication network among the subsystems 
or querying the subsystems often results in disrupt- 
ing the normal time behavior of the system, there- 
by making the observations unreliable. In addition, 
it is difficult to correlate the order of events during 
parallel state transitions and to identify synchro- 
nization and other timing errors. The following are 
among the difficulties that may be encountered: 

(1) timing skew (a result of geographic distribu- 
tion of the subsystems), 

(2) transmission delay (a delay between the time 
when an event occurs and the time when the 



occurrence of the event is announced), 

(3) propagation delay (a delay between the time 

when a message is sent and the time when it is 

recivd), 

5 (4) message latency (messages may be deliv- 

ered but not acted upon immediately), 

(5) insufficient state information (context is need- 
ed to interpret messages), 

(6) measurement disturbance (interrogation of 
10 the system, to retrieve context affects the sys- 
tem behavior), and 

(7) rate differences (if events occur at different 
rates, identifying trigger conditions and storing 
state information become very difficult). 

is Various aspects of these and other difficulties 

associated with monitoring subsystems of a distrib- 
uted system are discussed, for example, in U.S. 
Patent 4,400,783, issued to Locke et a/.; U.S. Pat- 
ent 4,630,224, issued to Sollman; U.S. Patent 

20 4,703,325, issued to Chamberlain et a/.; Kopetz et 
at., "Distributed Fault-Tolerant Real-Time Systems: 
The Mars Approach," Micro (IEEE), February 1989, 
pp. 25-40: and Zieher and Zitterbart, "NETMON - A 
Distributed Monitoring System", presented at the 

25 Sixth European Fiber Optic Communications & Lo- 
cal Area Networks Exposition, June 29-July 1, 
1988, Amsterdam, The Netherlands. 

Tsai, Fang and Chen in "A Noninvasive Ar- 
chitecture to Monitor Real-Time Distributed Sys- 

30 terns", Computer (IEEE), March 1990, pp. 11-23, 
have identified some problems of monitoring a 
distributed computing system. One such problem 
is that computations performed in such a system 
are nondeterministic and nonreproducible because 

35 of the presence of asynchronous parallel pro- 
cesses. This makes it difficult or impossible to 
determine the execution order of instructions be- 
longing to processes associated with separate sub- 
systems. 

40 Another problem encountered in monitoring a 

distributed system is that the system must comply 
with timing constraints imposed by real-world pro- 
cesses carried out on the various subsystems. 
Thus, any monitoring activity must not interfere 

45 with the real-time distributed computing environ- 
ment. 

A third problem is that any communications 
delay between subsystems can cause improper 
synchronization among the processors and make it 

so difficult to determine the actual time of an event 
and the state of the system at that time. 

From the foregoing it will be appar nt that 
there is a need for a way to monitor the behavior of 
each of a plurality of subsystems of a distributed 

55 system as a function of time. The subsystems may 
be computers, processors, instruments or other 
similar devices that interact with one another. A 
record of the time at which any unusual event 



2 



<EP 0585479A 1 J_> 



EP 0 585 479 A1 



4- 



r 



interfering_wiih normal operatiojiojLJhe s 
Time-stampjed/ data descriptive of (eyenlsjat 



occurs and the times at which various other events 
occur before or after the occurrence of the unusual 
event must be preserved for subsequent analysis 
without interfering with the operation of the system. 

5 

SUMMARY OF THE INVENTION 

The present invention provides a method and 
apparatus for monitoring the behavior over time of 
various subsystems of a distributed system without io 

system, 
sub- 
system" are placed into a local buffer, th e sub- 
syste m is not i fied o f the time when a trap condition 
occurs, and data having ~a time- stamp within a 75 
certain interval of the occurrence of the trap con- 
dition are archived for later analysis. 
| Briefly and in general terms, li methodj^moni^ 
taring a distributed system accordin^to :: tneinven- \ 
tion includes the steps, of collecting time-stamped J 20 
dat a respecting the state of one of thejsubs vstems , 
1 placing the data in a^buffet^ detecting the occur- 
rence of a trap conditionr^etermining the time at 
which the trap condition occurs, n otifying^ xT the 
occurrence of the trap condition and its time o f 25 
occurrence, and archiving any data having a time^ 
sta mp that is within a desired in terval of the time of 
trre~occurrence of the trap condition. The archived 
data provide a history of the state of the sub- 
system. ~ 30 



dure, such as pausing a subsystem for a certain 
interval of time after the trap condition has oc- 
curred, to find out any effects of the interruptive 
procedure on any of the subsystems. 

A distributed syste m embody ing the invention 
includes ^V"*pjuj ^ty^f^ubsy having a f 

sensor, a~cIock, a buffer, and logic means such as 
a loca l controller for^time^taTnj^ 
byihe~sensor, placing the time-stamped data in the 
buffer, and communicating with other subsystems. 
A com munic a tion link s uch as a direct wired circuit, 
a modem and a telephone line, or a local area 
network carries signals between the subsystems. 
JLihe sensor at one of the subsystems detects 
]for example a change in the state of the 
^system, data indicating the occurrence of the 
event are time-stamped and placed in the buffer for 
temporary storage. Then, if a trap condition occurs, 
a signal indicating the fact and the time of that 
occurrence are sent to the first subsystem. This j 
causes the first subsystem to archive any data 
having a time-stamp within a desired interval of 
time of the occurrence of the trap condition, for 
example by preserving the data in the buffer or by 
sending it to another memory or another location 
such as a central control unit for storage. 

Other aspects and advantages of the invention 
will become apparent from the following detailed 
description, taken in conjunction with the accom- 
panying drawings, illustrating by way of example 



f an event, )f( 
^^^suSsystem, 




40 



Typically the occurrence of the trap condition 
is detected by a sec ond subs ystem and the time of ? hu - ^ 
the occurrence is determined^by^a^ej aj^clo^k in 
that sgbsysjem. JThis clock is-^nchrdnfeed with a 
local clock in the first subsysfem^sojhat-events at 
the first subsystem can be correlated with the time 
at which the trap condition occurs at the second 
subsystem. 

^jn a preferred embodiment data respecting the 
st ates of a subsystem are classified into any of a 
plurality of classes, and trap conditions are cate- 
gorized into any of a plurality of categories. This 
makes it convenient for a subsystem at whi ch trap 
conditions are being tobservedl to selectively ( notify) 
one or more other subsystems depending on which 
category^ of trap^condition-has-occurred—Similarly^ 
a subsystem which receives a notification resp onds 
selective ly according to the ca tegory and clas s 
information— For-examplerthe"subsystem receiving 
the notification may send data from its buffer to be 
archived at any of several local or remote storage 
locations depending on which category of trap con 
dition has occurred and which class of ftime 
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so 



stamped~gata"ls'to"bea rch"i ved ,/ or some data may 

archived and otheT data discarded entirely 55 
based on these considerations. 

In one embodiment the st p of archiving the 
data also includes performing an interruptive proce- 



. the principles of the invention. 



BRIEF DESCRIPTION OF THE DRAWINGS 



FIGURE 1 is a flow chart depicting a preferred 
embodiment of a method of monitoring a distrib- 
uted system according to the invention; and 
FIGURE 2 is a block diagram of a distributed 
system configured according to a preferred em- 
bodiment of the invention. 



DETAILED DESCRIPTION 

As shown in the drawings for purposes of il- 
lustration, the invention is embodied in a novel 
method and ap^ajatus-for-monitoring the behavior 
over time of a" plurality of subsystem sTof a distrib- 
uted system. Monitoring-the-behavior""of such a 
system by intercepting signals flowing between the 
subsystems has not been satisfactory because of 
the difficulty of correlating parallel events and be- 
cause such monitoring tends to disrupt normal op- 
eration of the system. 

In accordance with the invention, time-stamped 
data descriptive of events at one subsystem are 
placed into a local buffer. When a trap condition 
occurs, any data having a time-stamp within a y 
certain interval of the time of occurrence of the trap 
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condition are archived for la ter analy sis^Local 
clocks in the subsystems are synch ronized to e n- 
sur e that e yents^occurring_at the Jjjfferent sub- 
systems can be accurately correlated. The inven- 
tion provides a chronological history of events at 
various subsystems of the system being monitored 
without disrupting normal operation of the system, 
facilitating debugging and understanding of the op- 
eration of^he ^ystenrV! 

^As*"shown in flowchart form in FIGURE 1, a 
/ method of monitoring the behavior over time of a./ 
plurality of interacting subsystems of a distributed^ 
system comprises col lectin g_(.1_l)_and^time-stamp- 
ing (13) data respecting (jne state^ of a^first-sub- 
system, placing (15) the data in a buffer, detecting 
(17) the occurrence of a trap condition, determining 
(19) the time at which the trap condition occurs, 
notifying (21) the first subsystem of the occurrence 
of the trap condition and its time of occurrence, 
and archiving (23) any data having a time-stamp 
within a desired interval of the time of the occur- 
rence of the tra p co ndition, thereby providing a 
history of the sTate~of~tFre first subsystem during 
the desired time interval. 

In a typical embodiment the steps of detecting^, 
the occurrence of the trap condition, determining 
the tirne^at which the trap condition occurs, and 
notifying thj^tig^subsystem are carried ouTby^a^k 
second-subsystem. In-this embodiment the method 
includes the step of synchronizing (25) a clock 
reference (27) in the f^rst^ubsystem'witrT a c\oc& N 
reference (29) in the second subsystem. 

Preferably the-trap~cohdition is categorized'(31) 
as belonging to one of a plurality of categories. The 
step of notifying of the occurrence of the trap 35 
condition optionally comprises selectively notifying 
(33) according to whether the trap condition be- 
longs to a preselected category. For example, the 
notification mig ht be sen t only if the detected tra p 
c ondition is ^temperature , change as opposed to a 40 
voltage change, or the notification might be sent to 
one subsystem if the detected trap condition is a 
positive temperature change and to another sub- 
system if the detected trap condition is a negative 
temperature change. 45 

The time-stamped data preferably are clas- 
sified (35) as belonging to one of a plurality of 
classes and the step of archiving the data option- 
ally comprises archiving selectively (37) according 
to the class of the data. The category of the trap 50 
condition may also be used for selectively archiv- 
ing the data. For example, data to be archived may 
be sent to any of several different local or remote 
storage locations according to one or both of these 
factors. These storage locations might be a portion 55 
of the buffer or of some other local storage, or a 
remote memory in a central controller or in another 
subsystem. Or the data may be selectively dis- 
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carded or archived according to the category of the 
trap condition or the class of the data. 

The second subsystem communicates (39) 
with the first subsystem by any suitable means of 
communication. The communication is depicted as 
unidirectional in FIG. 1, but of course the commu- 
nication may be bidirectional. Furthermore, each 
subsystem may notify the other of the occurrence 
of various trap conditions, as will be discussied 
presently in more detail. 

Although an advantage ofjhe'present invention 
is its ability to provide a (history of the system 
without disrupting the operation of the system, in 
one embodiment the step of archiving the data also 
includes deliberately performing (41) an interruptive 
procedure. For example, a subsystem may be told 
to pause for a certain interval of time after a trap 
condition has occurred so that the effect of the 
pause may be analyzed by examining archived 
20 data provided by the same or a different sub- 
system during or following the pause. 
/ A distributed system, generally 51, embodying 
the invention is shown in FIG. 2. The system in- 
cludes a f irsj^ subsy s tem which has a sensor 55 
for collecting data respiting the state of the sub- 
system 53, a clockC5Z^a buffer 59 such as a 
random access memory, and logic means 61 such 
as a local controller responsive to the sensor and 
the clock means to time-stamp data collected by 
tne §§nsqr and to place the time-stamped data in 
the bu'fferT 

Similarly, a second subsystem 63 has a sensor" 
65, a clock^67i)optionally~F buffer 69, and logic 
means 71 . TRe^logic means 71 is responsive to the 
sensor 65 and the clock 67 to determine the time 
of occurrence of a trap condition detected by the 
sensor and to send a signal notifying the first 
subsystem 53 of the occurrence of the trap con- 
dition and the time of occurrence. 

Communication means indicated by a line 73 
carries signals between the subsystems. The com- 
munication means may be, for example, a wire 
pair, modems and a telephone line, a local area 
network, a fiber optic link, or most any system for 
conveying information from one electronic device 
to another. 

The first logic means 61 archives any data 
having a time stamp within a desired interval of the 
time of occurre jTce^of jhe trap condi tion and there- 
by provides a history of the state of the first sub- 
system 53 during^the desired time interval. n A 
The clocks; 57 /and ,67) are preferably synchro- 7* 1 * 
nized. This is done by any convenient method. For 
example, the clocks may communicate directly 
with each another as indicated by a line 75, or they 

may be synchronized through theirj iespective lo gic 

means^Oipd 7l£_Jn an alternate embodiment a 
centraPcontroller 77 uses a processor 79 to regu- 
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late the system and to synchronize the clocks. The , 
central controller 77 has a central memory 81 
which may be partitioned into A and B subparts 83 
and 85 for archiving data from the various sub- 
systems. ^ 5 

Only ^tw^subsystems--53rand-63-are shown. 
These subsystems may be physically collocated or 
they may be sepa rated by s everal mete r s or even 
many kilometers and interconnected with each oth- 
er "ancTwith the ceTitTal controller by some conve- 10 
nient communication medium as indicated by a 
communication channel 87. It will be apparent that 
a typical distributed system may have many more 
than two subsystems and that any of such sub- 
systems may notify any one or more of the others 15 
of the occurrence of a trap condition, with the result 
that some or all of the subsystems receiving the 
notice may archive data descriptive of any changes 
in their respective statuses during various different 
time intervals- 20 

Optionally the subsystem 53 includes means 
such as a control element 89 responsive to the first 
logic means 61 to interrupt a portion of the system. 
The interruption may take various forms such as 
causing some part of the system to pause or 25 
actively perturbing the system, for example by 
injecting a signal or activating a mechanical device, 
so that time-stamped data indicative of the states 
of various subsystems before and after the in- 
terruption can be compared during the later-analyc, 
sis of the history of the system. /^jJT^y^ 
As has already been indicated, a^ub§y.sjeji 
| may^be-a-computer subsystem^ such as a^erminal,' 
l^^a^6rkstatidn^5r~even a~ large computer, a measur- 
ing-instrument such as a voltmeter or a thermom- 
eter; or a ny^de yice that can .g^^^e-data and take 
measurements~oTperform similar tasks. 
/> Each of the buffers 59 and 69 may be imple- 
/ mented as a queue or randomly addressable regis- 
/ ter through which passes a continuous stream of 40 
1 data and_ associated time stam ps. At any one time 
\ the buffer 59, for example, will contain a sequence 
of consecutive information items and associated 
Jime-stamps fora certain^ t ime interval, where the 
number of events ^eprdedyis determined in part 
bytheJength-of-the=register and the rate of arrival 
of data at the buffer. Each su bsystem such as the 
subsystem 53 may be attached'to or built into an 
instrument—or other device (not explicitly shown) 
that monitors^a process occurring in the real world 
and provides a stream of measurement data. 

Data need not be kept permanently in the 
buffer, but they must be kept there long enough to 
ensure their availability if they are selected for 
, archiving. For example, if it is desired to identify 55 
yft}' andjma[yze any states assumed by^a-eer^ain sub- 
• ""'system between ten milliseconds before' and ten 
milliseconds after the occurrence of "a certain trap 
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condition, then all such data must be kept in the 
buffer for at least ten milliseconds. Thus, upon 
receiving notification that the trap condition has 
occurred, all data for the preceding-ten millisec- 
onds are transferred from the (buffeNnto permanent 
archival^storage? Any data arriving in the buffer 



arriving in 

within terTrnHliseconds thereafter are likewise ar- 
chived. In additiprL jtma y be necessary to allow for 
communication delay. Thus, if it is known that it 
may-take-as-much as, say, thirty milliseconds for a 
message to arrive, then the data must be kept in 
the buffer for that additional amount of time. 

The capability of keeping the data in the buffer 
long enough to compensate for co mmunication de 
lays allows the system to deliberately delay send- 
Hn'g""a~noti'fication of the occurrence of a trap con- 
dition without adversely affecting the monitoring. 
This makes it possible to avoid any disruption of 
system operation such as might otherwise result 
from overloading a limited-capacity communication 
channel by trying to send the notification of the 
trap condition at the same time as some other part 
of the system also is attempting to use the same 
channel. 

The logic means 71 contains (or may obtain 
from the buffer 6 9 or fro m the^entral controller 77) 
A'a'pre deteTmineci^set of " trap conditionsy which are 
descriptions of particular or unusual statuses, 
) events, or changes in state. For example, a trap 
, S^onSTtion might be receipt or issuance of a control 
message to another subsystem, a change in an 
external state of an instrument to which the sub- 
system (acting as a detector) is attached, a change 
of internal state from one mode of operation to 
another, production of a predetermined data item 
by an instrument, receipt of an externally-gener- 
ated signal, a malfunction, or the passage of a 
predetermined amount of time. 

Each trap condition is associated with one or 
more values descriptive of theCstate pf another 
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subsyst em for which a time history should be pre- 
served for later analysis. For example, if a voltage 
exceeds a predetermined magnitude at a certain 
time, it might be desired to know the^tem perature 
of a certain transistor during the ten minutes~pre- 
ceding the time at which the voltage exceeded said 
predetermined magnitude. Thus, the subsystem 
which monitors the voltage ^would send a notifica- 
tioa-to-the subsystem which has been keeping a 
(record oj) the temperature, thereby causing the 
latter— subsystem to save its temperature records 
for the precedin g ten minute s. 

It will be apparent that a single subsystem 
could perform b oth the f unction of de tecting the 
trap condition and the function of collecting and 
archiving the data. In some distributed systems it 
may be advantageous to do this. However, an 
important advantage of the invention, specifically 



5 



9 



EP 0 585 479 A1 



10 



the ability to correlate events occurring at different 
locations of a system without disrupting the opera- 
tion of the system, is not attained if the invention is 
embodied in only a single subsystem. 

Trap conditions may be categorized. For exam- 
ple, it might be that only an abrupt change of 
voltage is of interest, in which case the voltage 
monitoring subsystem might send notifications only 
if the voltage is increasing at a predetermined rate 
when it exceeds the predetermined magnitude. 
Similarly, data being collected and placed in a 
buffer may be classified. Depending on the clas- 
sification of the data or on the category of the trap 
condition or both, the data may be discarded or 
stored in one or more storage locations. 

The subsystem that receives notification of a 
trap condition may^act-upon the information imme- 
diately, or it may^Selay^acting for some period of 
^/y^time such as, foV~example, ten milli se cond s or ten 
^minutesrirTorder to a ccumula teJnformation on how 
the subsystem responds to the occurrence of the 
trap condition. Of course, as an alternate way of 
accomplishing the same thing, the subsystem that 
detects the trap condition may delay sending the 
notice. 

Debugging of the hardware or software respon- 
sible for performance of tasks of a single, isolated 
subsystem is relatively straightforward. The inven- 
tion is of special value in debugging of hardware or 
software responsible for joint or concurrent perfor- 
mance of tasks and interactions between two or 
more subsystems. 

A detailed picture of the time evolution of the 
entire system of subs ystems can be reconst ructed 
by sorting the variousrevents in t ime^ba^ed upon- 
the time jtemps assoc iatexL with each of these 
feverrts'arjtored^ v^the variou^ su ^ystems j^This 



information is invaluable in debugging a system of 
interacting subsystems or in optimizing the perfor- 
mance of such a system as a function of time. The 
information is presented-iruany suitable form, such 
as textual, graphic, audible^ or a direct input to a 

computer. ^ * 

From the foregoing it will be appreciated that 
tfie method and apparatus of the invention enable a 
user to Observe the system state time evolution of 
distributed system easily and conveniently. The 
user receives a precise time history of the v arious 
events in the system notwithstanding the existence 
of asynchronous parallel processes, which in the 
past have made it difficult or impossible to deter- 
mine the execution order of instructions belonging 
to processes associated with separate subsystems. 
The monitoring does not interfere with any timing 
constraints imposed by a need to monitor real- 
world events in real time. Delays in communication 
have^~adverse^ffe£tTM 

invention does not disrupt the normal time behavior 




of the interacting subsystems. Complex and cum- 
bersome instruments such as logic analyzers are 
not required. 

Although certain specific embodiments of the 

s invention have been described and illustrated, the 
invention is not to be limited to the specific forms 
or arrangements of parts so described and illus- 
trated, and various modifications and changes can 
be made without departing from the scope and 

io spirit of the invention. Within the scope of the 
appended claims, therefore, the invention may be 
practiced otherwise than as specifically described 
and illustrated. 

is Claims 

1. A method of monitoring the^beh^vior^Jver time 
of a plurality of interacting-subsy stems of a 
distributed system, the method comprising: 

collecting data^respecting the state of a 
first subsystem (11); * > ^7 
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time-stamping the data (13); ^ 
placing the data in a buffering ); 
detecting the occurrence of a trap con- 
dition (17); 

determining the time at which the trap 
condition occurs (19); 

notifying the first subsystem of the occur- 
rence of the trap condition and its time of 
occurrence (21); and 

archiving any data having a time-stamp 
within a desired interval of the time of the 
occurrence of the trap condition (23) and 
thereby providing a history of the state of the 
first subsystem during the desired time inter- 
val. 

A method as in claim 1 and further comprising 
synchronizing clock references (25) in a plural- 
ity of the subsystems. 



3. A method as in clair^1~oT2~and further com-* 
prising categorizing the trap condition as be- 
longing to one of a plurality of categories (31) 
45 and wherein the step of notifying of the occur- 

rence of the trap condition comprises selec- 
tively notifying according to the category of the 
trap condition (33). 

so 4. A method as in claim 3 wherein the step of 
archiving data comprises selectively archiving 
according to the category of the trap condition 
(37). 

55 5. A method as in any preceding claim and fur- 
ther comprising classifying the time-stamped 
data as belonging to one of a plurality of 
classes (35) and wherein the step of archiving 
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data comprises selectively archiving according 
to the class of the data (37). 

6. A method as in claim 5 wherein the step of 
archiving data comprises performing an inter- 5 
ruptive procedure (41). 

7. A distributed system (51). comprising: 

a first subsystem (53) including a first sen- 
sor (55) operative to collect data respecting the 10 
state of the subsystem, first clock means (57), 
a first buffer (59), and first logic means (61) 
responsive to the sensor and the clock means 
to time-stamp data collected by the sensor and 
to place the time-stamped data in the buffer; is 

a second subsystem (63) including a sec- 
ond sensor (65) operative to detect the occur- 
rence of a trap condition, second clock means 
(67), and second logic means (71) responsive 
to the sensor and the clock means to deter- 20 
mine the time of occurrence of a trap condition 
detected by the sensor and to send a signal 
notifying the first subsystem of the occurrence 
of the trap condition and the time of occur- 
rence; 25 

means (75) for synchronizing the first and 
second clock means; and 

communication means (73) operative to 
carry signals between the subsystems; 

the first logic means being operative to 30 
archive any data having a time stamp within a 
desired interval of the time of occurrence of 
the trap condition and thereby provide a his- 
tory of the state of the first subsystem during 
the desired time interval. 35 



a A distributed system as in claim 7 wherein the 
second logic means sends the notification sig- 
nal selectively according to whether the trap 
condition belongs to a preselected category 40 
and wherein the first logic means archives the 
data selectively according to the class of the 
data. 



9. A distributed system as in claim 7 or 8 and 45 
further comprising means (89) responsive to 

the first logic means to interrupt a portion of 
the system. 

10. A distributed system as in any of claims 7-9 50 
and further comprising a central controller (77) 
having a central memory (81), the first logic 
means being operative to send any time- 
stamped data which is to be archived to the 
central memory for archival storage and the 55 
communication means being operative to carry 
signals between the subsystems and the cen- 
tral controller. 
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